First, we will load in the necessary libraries.
library(ggplot2)
library(dplyr)
library(tidyverse)
library(ggmap)
library(knitr)
library(kableExtra)
library(corrplot)
library(scales)
library(RColorBrewer)
library(plotly)
Then, we will read in the csv file containing the data.
airbnbdf = read.csv(file = 'data/ab-nyc-2019.csv')
Now that we have the data loaded in, we will remove the columns that are unnecessary.
airbnbdf = subset(airbnbdf, select = -c(name, id, last_review, host_name))
The data contains some NA values that will have to be dealt with. The only column containing NA values is the “reviews_per_month” column and it is because the “number_of_reviews” column has 0 as its value for the rows corresponding with the NA values for “reviews_per_month”. In order to deal with it, the NA values will be replaced with 0’s.
naCols <- airbnbdf[colSums(is.na(airbnbdf)) > 0]
colnames(naCols)
## [1] "reviews_per_month"
na_values= airbnbdf[rowSums(is.na(airbnbdf)) > 0,]
airbnbdf[is.na(airbnbdf)] <- 0
Check to see what the data looks like.
head(airbnbdf, 10)
Summary stats of the dataset:
| Average Price | Median Price | Minimum Price | Maximum Price | St Dev of Price | Number of Reviews | Total Number of Airbnbs |
|---|---|---|---|---|---|---|
| 152.72 | 106 | 0 | 10000 | 240.15 | 1138005 | 48895 |
Summary stats of the dataset by borough:
| Borough | Average Price | Median Price | Minimum Price | Maximum Price | St Dev of Price | Number of Reviews | Total Number of Airbnbs |
|---|---|---|---|---|---|---|---|
| Bronx | 87.50 | 65 | 0 | 2500 | 106.71 | 28371 | 1091 |
| Brooklyn | 124.38 | 90 | 0 | 10000 | 186.87 | 486574 | 20104 |
| Manhattan | 196.88 | 150 | 0 | 10000 | 291.38 | 454569 | 21661 |
| Queens | 99.52 | 75 | 10 | 10000 | 167.10 | 156950 | 5666 |
| Staten Island | 114.81 | 75 | 13 | 5000 | 277.62 | 11541 | 373 |
Scatterplot of the airbnbs in the state using the longitude and latitude provided for each location:
Since the scatterplot can be a bit crowded, a density map will be better:
Density map for each borough:
Bargraph of number of airbnbs per borough:
Bargraph of the average price of an airbnb per borough:
Bargraph of the count of each kind of airbnb per borough:
Bargraph of type of room vs mean price:
Bargraph of the number of reviews per borough:
Scatterplot of the number of reviews vs price of airbnb. According to this scatterplot, there does not seem to be any correlation between the number of reviews an airbnb has and its price.
Correlation Matrix. According to the correlation matrix, none of the variables are correlated.